Week 3.1 - The Cost of Every Prompt

What We'll Cover

Every time you send a message to an AI assistant, something physical happens: electricity flows through data-centre hardware on the other side of the world. This session puts numbers on that — not to make you feel guilty about using AI, but to help you think clearly about its environmental implications as a researcher and as a citizen.

We will look at energy consumption from the level of a single query up to the scale of global AI usage, at water — an environmental cost that receives less attention than carbon — and at why the numbers you read in the media should always be treated with careful scepticism.

This is a topic where the data are genuinely uncertain and where companies have strong incentives to present their products in the best possible light. Part of our job here is to understand why the numbers are uncertain, not just what they are.

⚠️ A Note on Numbers Throughout This Week

Almost all figures in this area come from one of three sources: company self-reports, independent academic estimates, or media extrapolations from those estimates. These often disagree substantially. Throughout these lessons we will flag where figures come from and what assumptions they rest on. Treat any specific number as an order-of-magnitude guide rather than a precise measurement.

🔍 The Transparency Problem

Before we look at any numbers, it is worth understanding why getting accurate figures is so difficult — because the answer shapes how we interpret everything that follows.

What Companies Don't Disclose

Major AI developers — OpenAI, Google, Meta, Microsoft, Anthropic — do not publish detailed energy or emissions data for individual models. What is typically missing:

Energy consumed per query (by model type or size)
Data centre locations and their grid carbon intensity
Water consumption at specific facilities
Hardware lifecycle data (manufacturing emissions)
Total annual compute for model training

Some companies publish aggregate sustainability reports, but these cover their entire operations, not AI specifically, and rely on accounting methods that can obscure more than they reveal.

What Researchers Can Estimate

In the absence of direct data, researchers use indirect methods to estimate AI's environmental footprint:

Hardware benchmarks: Measure energy use of known GPUs, estimate utilisation rates
Open-source proxies: Run open-weight models locally and measure directly
Architectural inference: Use known model sizes and FLOPs estimates to project energy
Company disclosures: Use reported figures with appropriate scepticism about representativeness

This is why published estimates for the same model can differ by a factor of 10 or more.

📄 Key Reading: The Hidden Costs of AI

Nature (2024): "Generative AI’s environmental costs are soaring — and mostly secret" — Nature's news team interviews researchers and industry figures on the difficulty of getting accurate data from companies. A good starting point for understanding the transparency problem.

⚡ Energy per Interaction

Let's start at the level of a single query. How much electricity does it take to generate one response?

📊 Energy Consumption by Task Type

The table below combines company-reported figures and independent measurements from MIT Technology Review (2025). Note the wide ranges — these reflect genuine differences in model size, hardware, and methodology, not just uncertainty.

Task	Energy Estimate	Everyday Comparison	Source / Confidence
Google Search query	~0.3 Wh	~11 seconds of a light bulb (LED)	Google; moderate confidence
ChatGPT text prompt (reported)	0.34 Wh	~13 seconds of a light bulb	OpenAI blog, June 2025; likely best-case
Gemini text prompt (reported)	0.24 Wh	~9 seconds of TV	Google White Paper, 2025; likely best-case
Large model text prompt (independent)	0.1–8 Wh	4 seconds – ~5 minutes of a microwave	MIT Technology Review, 2025; measured on open-weight models
AI image generation	~2–5 Wh	~1–3 minutes of a microwave	Various; moderate confidence
AI video generation (5 seconds)	~3.4 MJ (~944 Wh)	Approximately 1 hour of microwave use	MIT Technology Review, 2025; ~700× image cost

Key observation: Text generation and video generation are not in the same ballpark — they differ by roughly three orders of magnitude. Company-reported figures for text prompts are at the lower end of independent measurements, likely reflecting optimised infrastructure not available to all deployments.

📹 Video Generation: A Different Category

The energy cost of generating a 5-second AI video (~944 Wh, per MIT Technology Review 2025) is not a typing error. High-resolution video generation involves running a very large diffusion model many times over — the equivalent of generating hundreds or thousands of images sequentially and combining them. This figure comes from measuring open-source video generation models; proprietary models like Sora may differ.

For context: 944 Wh is roughly one-seventh of what an average South African household uses in a day (around 6–7 kWh) — a meaningful amount for a few seconds of video, but still a fraction of daily household use.

🤖 The Agentic Multiplier: Tools Like Claude Code

The figures above describe a single prompt: you ask, the model answers once. But the way many researchers now use AI is very different. Agentic tools — Claude Code, Codex, Cursor's agent mode, Gemini CLI — do not answer once. They run a loop: read files, call tools, run code, read the output, reason about it, and call the model again, often dozens of times to complete one task. Energy use scales roughly with the total number of tokens the model processes, so an agentic task can cost many times more than a single chat prompt. This section tries to put bounds on "how much more" — and, as with everything in this lesson, the honest answer is a wide range.

⚠️ This is an estimate built on other estimates

Every figure in this section is derived by chaining together numbers from earlier in the lesson with practitioner reports and vendor documentation. None of it is a direct measurement of a research workflow. We present each input with its source and give a central estimate with lower and upper bounds so you can see exactly where the uncertainty enters. Treat the final numbers as order-of-magnitude guides, not precise figures.

🎚️ What "effort" Means in Claude Code

Claude's API exposes an effort setting that controls how many tokens the model is willing to spend on a task. Anthropic's documentation describes it as "a behavioral signal, not a strict token budget" — so it does not map onto a fixed energy figure, but it does directly change token use, and therefore energy. Crucially, effort affects all tokens in a response, including tool calls: lower effort means the model makes fewer tool calls and writes less; higher effort means more exploration, more tool calls, and deeper reasoning.

Effort level	What Anthropic's docs say it is for	Illustrative energy multiplier (relative to `high` = 1×)
low	Simplest tasks, fastest, lowest cost; "scopes its work to what was asked"	~0.3–0.5×
medium	Balanced; the "drop-in for the average workflow" where you want to reduce cost	~0.6–0.8×
high (API default)	Complex reasoning and agentic tasks; the "sweet spot" balancing quality and tokens	1× (reference)
xhigh (Opus 4.7)	Long-running agentic/coding tasks over 30 minutes; "token budgets in the millions"; "meaningfully higher token usage than high"	~2–5×
max	"No constraints on token spending"; reserve for frontier problems — "significant cost for relatively small quality gains" and can "overthink"	~3–10×

Important: The "multiplier" column is our own illustrative estimate, inferred from the qualitative descriptions in Anthropic's documentation. Anthropic does not publish numeric energy or token multipliers for the effort levels. The short quoted phrases describing each level come directly from Anthropic's Effort documentation; the numbers attached to them do not. We include the multipliers only to make the rest of the calculation concrete — they could easily be off by a factor of two or more.

🔢 The Calculation Ledger: Every Input and Its Source

To estimate the footprint of a working day with an agentic tool, we need three quantities, each with a central value, a range, and a source. (How many calls make up a working day is a separate scenario assumption, stated below the table.)

Input	Lower	Central	Upper	Source & reasoning
A. Energy per model call	~1 Wh	~3 Wh	~8 Wh	This lesson's table: 0.34 Wh (OpenAI, reported best-case) up to 8 Wh (MIT Tech Review 2025, large open-weight model). Agentic calls carry long contexts (file contents, tool outputs, accumulated reasoning), so they sit toward the upper end — we centre on ~3 Wh.
B. Compute per task (simple-prompt-equivalents)	~10×	~30×	~100×	An agentic task is many model calls on large contexts. Analyses converge on roughly 5–30× the tokens of a single chat interaction for typical agents, rising to 100× or more for complex coding workflows (the Stanford Digital Economy Lab; SWE-bench-style coding tasks average 1–3.5M tokens/task). We take 10–100× as a central band — possibly conservative at the top. Anthropic's docs confirm the direction ("token budgets in the millions" at xhigh) but give no number.
C. Grid carbon intensity	0.386 kg CO₂/kWh (US grid average) \| ~0.9 kg CO₂/kWh (South Africa)			US figure as used elsewhere in this lesson; SA figure from the 2022 Grid Emission Factors Report (~0.87–1.01 kg/kWh depending on methodology — SA's grid is ~80% coal).

📊 Putting the Inputs Together: A Day on `high` Effort

Multiplying A × B gives the energy of one task. For a working day we assume ~50 substantial, compute- and loop-intensive calls — not quick one-shot lookups, which cost far less. We hold the per-call energy (A) near its ~3 Wh midpoint and let the per-task multiplier (B) drive the range; applying the grid carbon intensity (C) then converts energy to CO₂. (Letting A vary across its full 1–8 Wh band as well would widen these numbers further in both directions.)

Quantity	Lower	Central estimate	Upper
Energy per agentic task (A × B)	~30 Wh	~100 Wh	~300 Wh
Energy for a 50-task day (× 50)	~1.5 kWh	~5 kWh	~15 kWh
CO₂ — US grid (× 0.386)	~0.6 kg	~1.9 kg	~5.8 kg
CO₂ — South African grid (× 0.9)	~1.4 kg	~4.5 kg	~13.5 kg

For perspective, the central ~5 kWh for a full day of agentic coding is roughly five times the 944 Wh of a single 5-second AI video, and more than ten thousand times a single reported text prompt — but it is still a modest fraction of a household's daily electricity use. The footprint of this kind of AI use comes from the loop, not from any one prompt.

🚗 How Does This Compare to Driving a Car?

A "typical passenger vehicle" emits about 400 g CO₂ per mile ≈ 0.25 kg CO₂ per km (US EPA). Dividing the day's emissions by that figure gives an equivalent driving distance:

On the US grid: a full day ≈ ~8 km of driving (range ~2–23 km)
On South Africa's coal-heavy grid: a full day ≈ ~18 km of driving (range ~5–54 km)

In other words, a researcher's central-estimate day on high effort is comparable to a short commute. (Newer petrol cars emit closer to 0.12–0.17 kg/km, which would roughly double these equivalent distances — the car comparison is itself a range.)

⚠️ The Ceiling: `xhigh` and `max` Effort

The day above assumes high effort. Pushing every task to xhigh or max — which Anthropic reserves for "genuinely frontier problems" and warns adds "significant cost for relatively small quality gains" — could multiply token use a further ~3–10× (our illustrative figure). Applied to the central ~5 kWh day, that puts a heavy max-effort day in the region of 15–50 kWh, or roughly tens to over 150 km of equivalent driving on the SA grid. But note that effort is a ceiling, not a floor: even on max, a simple request still resolves cheaply, because the model spends tokens roughly in proportion to a task's difficulty. These figures assume 50 genuinely compute- and loop-intensive calls — a day padded with quick lookups would land well below them. The practical lesson is the same one Anthropic's own guidance gives: do not run at maximum effort by default. Match the effort to the task, both for cost and for footprint.

💡 What This Means for You as a Researcher

The per-prompt footprint of AI is genuinely small. The footprint of agentic AI is larger — not because any single step is expensive, but because there are so many steps. The same property that makes these tools powerful for research (they keep working autonomously) is what drives their energy use.

This connects directly to the rebound problem we examine in the next session: as agentic tools make each task cheaper and easier, we tend to run far more of them. The individual cost falls; the total can still rise. The responsible move is not to avoid these tools, but to use the right effort level for the job, and to be honest — as we have tried to be here — about how uncertain the numbers really are.

✈️ A Worked Example: The Flight Comparison

A common comparison in media coverage: how does using AI compare to taking a transatlantic flight? Let's work through this carefully — because how you do the calculation matters enormously.

🔢 Step-by-Step Calculation

Reference point: one economy-class transatlantic flight (London → New York, ~5,540 km)

CO₂ per passenger (direct emissions only): ~0.5 tonnes
Including radiative forcing at altitude (contrails, water vapour): roughly doubles the climate impact
Commonly used estimate: ~1 tonne CO₂e per passenger
Note: figures from different calculators range from 0.5 to 1.5 tonnes — this is itself uncertain

Carbon per ChatGPT text prompt:

Energy assumption	Source	CO₂ per prompt (US grid avg: 0.386 kg/kWh)	Calls to match 1-tonne flight
0.34 Wh	OpenAI (reported, 2025)	0.13 g CO₂	~7.6 million
2 Wh	Mid-range independent estimate	0.77 g CO₂	~1.3 million
8 Wh	Large model, independent measurement	3.1 g CO₂	~323,000

What this tells us: Depending entirely on which figures you use, one transatlantic flight equals somewhere between 300,000 and 7.6 million ChatGPT text queries. This is not a rounding error — it reflects the difference between company-optimised infrastructure and real-world large-model deployments, as well as genuine uncertainty about what "a ChatGPT query" even means in terms of model size and compute.

💡 Why the Range Is the Lesson

The fact that this calculation can produce answers spanning two orders of magnitude is not a failure of the analysis — it is the most important finding. It tells us that glib comparisons ("ChatGPT = X flights per day") depend almost entirely on unstated assumptions, and that the opacity of AI companies makes it impossible to pin down a single honest answer.

As researchers, the appropriate response is not to pick the number that fits our preferred narrative, but to present the range honestly and to push for better disclosure.

📄 Source for the energy figures above

MIT Technology Review (2025): "We did the math on AI’s energy footprint. Here’s the story you haven’t heard." — independent measurements of open-source models providing a useful counterpoint to company-reported figures.

Luccioni et al. (2023): "Power Hungry Processing: Watts Driving the Cost of AI Deployment?" — benchmarks inference energy across a wide range of NLP tasks and model types.

🌍 Zooming Out: AI’s Share of Global Emissions

The per-query comparison above is useful for personal calibration — but it can’t tell you what share of global emissions AI actually represents. For that, you have to layer estimates from primary sources and be honest about what is measured versus modelled.

📊 A back-of-envelope: AI vs aviation, with sources

Step 1 — Inputs (primary sources):

Data centres ≈ 1.5% of global electricity (≈ 415 TWh in 2024) — IEA, Energy and AI (2025).
AI-focused share ≈ 0.5% of global electricity (2025) — Hannah Ritchie’s estimate from the IEA inputs, Sustainability by Numbers (2026).
Electricity & heat ≈ 40% of energy-related CO₂; energy-related CO₂ ≈ 85–90% of total CO₂ — IEA / Our World in Data sectoral breakdown.

Step 2 — Calculation: If AI’s electricity were at average grid carbon intensity, AI’s share of energy-related CO₂ ≈ 0.5% × 40% ≈ 0.2%. AI loads are concentrated in the US (gas-heavy) and China (coal-heavy), pushing intensity above average; the largest operators’ renewable and nuclear procurement pulls below it (though annual matching overstates hourly cleanliness). Net plausible range: 0.2–0.4% of energy-related CO₂, or roughly 0.15–0.35% of total CO₂ / GHG depending on denominator.

Step 3 — The aviation benchmark. Aviation accounted for 2.5% of global energy-related CO₂ in 2023 (≈ 950 Mt) — IEA, Aviation. Passenger flights are ≈ 81% of that and freight ≈ 19%, so passenger aviation ≈ 2% of global CO₂.

Headline: Passenger air travel emits roughly 6–10× what AI does operationally right now. AI is around a tenth to a fifth of aviation’s footprint at the system level.

Two caveats sharpen the comparison further:

Aviation’s real impact is larger than the CO₂ alone. Non-CO₂ effects — contrails and NO_x at altitude — push aviation’s share of anthropogenic warming closer to 3.5–4% (Lee et al. 2021, Atmospheric Environment). The gap on a warming basis is wider than the gap on a CO₂ basis.
Aviation is concentrated; AI use is broadly distributed. Only ≈ 11% of the world’s population flew in 2018, and the most frequent ≈ 1% account for more than half of passenger-flight emissions (Gössling & Humpe 2020, Global Environmental Change). AI use is becoming more broadly distributed, which makes the per-person picture genuinely different from the per-person flight picture even when the system-level numbers sit in the same neighbourhood.

And the ratio is narrowing. The IEA projects data-centre electricity roughly doubling from ≈ 485 TWh (2025) to ≈ 950 TWh (2030, ≈ 3% of global electricity), with the AI-focused slice tripling — moving AI from ≈ one-third toward ≈ half of data-centre load. Even so, the IEA projects all data-centre CO₂ at only ≈ 1% of energy-related CO₂ by 2030. AI would need several-fold growth relative to aviation to close the gap.

⚠️ How to read these numbers

These are layered estimates, not audited measurements. The 0.2–0.4% calculation rests on three assumptions, each with its own uncertainty: (a) the IEA’s 1.5% data-centre figure, (b) Hannah Ritchie’s one-third allocation to AI specifically, and (c) average grid intensity. Treat the headline as a well-reasoned estimate from primary sources, not as a measurement — and the same scepticism the rest of this lesson asks you to apply to vendor claims applies to careful third-party estimates too.

The figures also exclude embodied emissions — chip fabrication, server manufacture, data-centre construction (concrete, steel). 3.3 returns to those in detail, because lifecycle accounting can raise the operational number materially. And the definition of “AI” versus general accelerated compute is itself fuzzy: not every workload on a GPU is what most people mean by AI.

💧 The Water Footprint

Energy gets most of the attention, but water is the environmental cost of AI that is least reported and least understood. Data centres are thirsty in two distinct ways.

Direct Water Use: Cooling

Modern data centres generate enormous amounts of heat. The most common solution is evaporative cooling — running water over heat exchangers where it evaporates, carrying heat away.

Why water, not air? Evaporative cooling is far more energy-efficient than air cooling for the temperatures involved
Why filtered municipal water? Minerals in unfiltered water would corrode and clog sensitive hardware — most large data centres use high-quality drinking water
Estimate: A 100 MW data centre may consume the equivalent of ~2,600 households' daily water use (IEA estimate)
US total: Data centres directly consume roughly 17.5 billion gallons of water per year — about 0.3% of the US public water supply (Lawrence Berkeley National Laboratory, 2024)

Indirect Water Use: Electricity

Generating electricity also consumes water — at the power plant, not the data centre. This "indirect" water use is often larger than the direct cooling water.

Thermal power plants (coal, gas, nuclear) use water for steam generation and cooling
This water is often not returned to its source — it is evaporated or discharged at a different temperature
A commonly cited estimate (Shaolei Ren, UC Riverside): a ~30-turn ChatGPT conversation = roughly 500 ml of water, of which only 12–13% is direct cooling water; the rest is from electricity generation
Geographic sensitivity: Water extracted in arid regions has very different consequences from the same volume extracted in water-abundant areas

⚠️ Treating the 500 ml figure with care

The "500 ml per 30-turn conversation" figure (from Li et al., 2023 / Shaolei Ren) became widely cited in media coverage, often without key caveats: it is an estimate based on assumed data centre locations (US-average), a specific model generation (GPT-3 era), and indirect water attribution methods that are contested. Modern data centres vary widely in water efficiency — some use closed-loop systems that recirculate water rather than evaporating it. Treat this as an order-of-magnitude estimate, not a precise measurement. The original paper acknowledges significant uncertainty.

📄 Key Reading: AI Water Footprint

Li, Yang, Islam, Ren (2023): "Making AI Less Thirsty" — the most-cited quantitative analysis of AI water consumption. Read the paper, not just the media coverage of it.

📊 Putting It at Scale

Individual query costs only matter if we multiply them by usage. Let's consider what the aggregate picture looks like.

🌍 AI Usage at Global Scale (2025)

Metric	Figure	Source / Note
ChatGPT queries per day	~2.5 billion	OpenAI (reported to Axios, 2025)
Organisations using AI	78% of surveyed organisations	Stanford AI Index, 2025
US data centre electricity (2023)	~100 TWh/year	Lawrence Berkeley National Laboratory, 2024 (tripled since 2014)
Projected AI data centre electricity (2028)	250–400 TWh/year	Various analysts; high uncertainty
US household electricity reference	~1,200 TWh/year total	US EIA; for comparison purposes

If AI data centres reach 400 TWh by 2028, that would be roughly equivalent to one-third of all US household electricity consumption. These projections carry very high uncertainty — they depend on assumptions about AI adoption rates, hardware efficiency improvements, and grid composition.

💡 Training vs. Inference: Where the Energy Goes

Much early coverage of AI's environmental cost focused on the energy required to train a model — the one-time compute cost of creating GPT-4 or Claude. But for widely-deployed models, inference — running the model to answer queries — can easily exceed training energy over the model's lifetime.

With 2.5 billion queries per day, the cumulative inference cost of ChatGPT dwarfs the one-time training cost within months of deployment. This is why statements like "training GPT-3 emitted X tonnes of CO₂" need to be placed in the context of ongoing inference costs.

🔬 Try this yourself (about 10 minutes)

This page argues for calibrated numbers over both panic and dismissal. Here is a quick way to feel both the estimate and the opacity for yourself, in two short steps.

▸Put a number on something real. Take one computation or AI task you actually ran recently and estimate its footprint — either with this lesson's per-interaction figures, or with a free calculator such as Green Algorithms (we return to these tools in 3.4). Notice how much the answer depends on assumptions you had to choose.
▸Now hit the transparency wall. Pick one AI assistant you use and try to find its official per-query energy or water figure — a number from the company you could actually cite. Spend five minutes and then stop. The difficulty you just experienced is the “transparency problem” this page opened with: the figures you can cite are mostly third-party estimates, not disclosed measurements.

The point is not a precise total. It is to come away holding a defensible order-of-magnitude and an honest sense of how much the vendors are not telling you.

📚 Summary & Key Takeaways

Before we can have a productive conversation about AI's environmental impact, we need to understand the numbers — and their limits:

Corporate opacity is the fundamental problem: AI companies don't disclose the data needed to calculate their environmental footprint, so all estimates involve assumptions
Text vs. video is not comparable: Video generation uses roughly 1,000× more energy than text generation — these are different categories of use
The flight comparison spans orders of magnitude: 300,000 to 7.6 million queries per flight, depending on model size and whose figures you trust
Water is underreported: Both direct cooling and indirect electricity generation consume significant water, with geographic implications
Scale is what matters: Individual query costs are small; billions of queries per day is not
Inference, not just training: The ongoing cost of running models at scale quickly exceeds one-time training costs
Agentic tools cost more than single prompts: A day of work with a tool like Claude Code (~50 tasks on high effort) is on the order of ~5 kWh — central estimate ~1.9 kg CO₂ (US grid) / ~4.5 kg (SA grid), comparable to driving ~8–18 km. The cost comes from the loop of many calls, not any one prompt, and pushing to max effort can multiply it several-fold

Next session (Week 3.2): We zoom out from individual queries to the infrastructure level — where does the electricity come from, how does manufacturing hardware fit in, and why do efficiency gains so often fail to reduce total energy use?